38 research outputs found
Systematic analysis of agreement between metrics and peer review in the UK REF
When performing a national research assessment, some countries rely on
citation metrics whereas others, such as the UK, primarily use peer review. In
the influential Metric Tide report, a low agreement between metrics and peer
review in the UK Research Excellence Framework (REF) was found. However,
earlier studies observed much higher agreement between metrics and peer review
in the REF and argued in favour of using metrics. This shows that there is
considerable ambiguity in the discussion on agreement between metrics and peer
review. We provide clarity in this discussion by considering four important
points: (1) the level of aggregation of the analysis; (2) the use of either a
size-dependent or a size-independent perspective; (3) the suitability of
different measures of agreement; and (4) the uncertainty in peer review. In the
context of the REF, we argue that agreement between metrics and peer review
should be assessed at the institutional level rather than at the publication
level. Both a size-dependent and a size-independent perspective are relevant in
the REF. The interpretation of correlations may be problematic and as an
alternative we therefore use measures of agreement that are based on the
absolute or relative differences between metrics and peer review. To get an
idea of the uncertainty in peer review, we rely on a model to bootstrap peer
review outcomes. We conclude that particularly in Physics, Clinical Medicine,
and Public Health, metrics agree quite well with peer review and may offer an
alternative to peer review
From Louvain to Leiden: guaranteeing well-connected communities
Community detection is often used to understand the structure of large and
complex networks. One of the most popular algorithms for uncovering community
structure is the so-called Louvain algorithm. We show that this algorithm has a
major defect that largely went unnoticed until now: the Louvain algorithm may
yield arbitrarily badly connected communities. In the worst case, communities
may even be disconnected, especially when running the algorithm iteratively. In
our experimental analysis, we observe that up to 25% of the communities are
badly connected and up to 16% are disconnected. To address this problem, we
introduce the Leiden algorithm. We prove that the Leiden algorithm yields
communities that are guaranteed to be connected. In addition, we prove that,
when the Leiden algorithm is applied iteratively, it converges to a partition
in which all subsets of all communities are locally optimally assigned.
Furthermore, by relying on a fast local move approach, the Leiden algorithm
runs faster than the Louvain algorithm. We demonstrate the performance of the
Leiden algorithm for several benchmark and real-world networks. We find that
the Leiden algorithm is faster than the Louvain algorithm and uncovers better
partitions, in addition to providing explicit guarantees
Large network community detection by fast label propagation
Many networks exhibit some community structure. There exists a wide variety
of approaches to detect communities in networks, each offering different
interpretations and associated algorithms. For large networks, there is the
additional requirement of speed. In this context, the so-called label
propagation algorithm (LPA) was proposed, which runs in near-linear time. In
partitions uncovered by LPA, each node is ensured to have most links to its
assigned community. We here propose a fast variant of LPA (FLPA) that is based
on processing a queue of nodes whose neighbourhood recently changed. We test
FLPA exhaustively on benchmark networks and empirical networks, finding that it
can run up to 700 times faster than LPA. In partitions found by FLPA, we prove
that each node is again guaranteed to have most links to its assigned
community. Our results show that FLPA is generally preferable to LPA
Italian sociologists: A community of disconnected groups
Examining coauthorship networks is key to study scientific collaboration patterns and structural characteristics of scientific communities. Here, we studied coauthorship networks of sociologists in Italy, using temporal and multi-level quantitative analysis. By looking at publications indexed in Scopus, we detected research communities among Italian sociologists. We found that Italian sociologists are fractured in many disconnected groups. The giant connected component of the Italian sociology could be split into five main groups with a mixture of three main disciplinary topics: sociology of culture and communication (present in two groups), economic sociology (present in three groups) and general sociology (present in three groups). By applying an exponential random graph model, we found that collaboration ties are mainly driven by the research interests of these groups. Other factors, such as preferential attachment, gender and affiliation homophily are also important, but the effect of gender fades away once other factors are controlled for. Our research shows the advantages of multi-level and temporal network analysis in revealing the complexity of scientific collaboration patterns
IGraph/M: graph theory and network analysis for Mathematica
IGraph/M is an efficient general purpose graph theory and network analysis
package for Mathematica. IGraph/M serves as the Wolfram Language interfaces to
the igraph C library, and also provides several unique pieces of functionality
not yet present in igraph, but made possible by combining its capabilities with
Mathematica's. The package is designed to support both graph theoretical
research as well as the analysis of large-scale empirical networks.Comment: submitted to Journal of Open Source Software on August 30, 202
Exploring the mobility of mobile phone users
Mobile phone datasets allow for the analysis of human behavior on an
unprecedented scale. The social network, temporal dynamics and mobile behavior
of mobile phone users have often been analyzed independently from each other
using mobile phone datasets. In this article, we explore the connections
between various features of human behavior extracted from a large mobile phone
dataset. Our observations are based on the analysis of communication data of
100000 anonymized and randomly chosen individuals in a dataset of
communications in Portugal. We show that clustering and principal component
analysis allow for a significant dimension reduction with limited loss of
information. The most important features are related to geographical location.
In particular, we observe that most people spend most of their time at only a
few locations. With the help of clustering methods, we then robustly identify
home and office locations and compare the results with official census data.
Finally, we analyze the geographic spread of users' frequent locations and show
that commuting distances can be reasonably well explained by a gravity model.Comment: 16 pages, 12 figure
Perspectives on Scientific Error
Theoretical arguments and empirical investigations indicate that a high proportion of published findings do not replicate and are likely false. The current position paper provides a broad perspective on scientific error, which may lead to replication failures. This broad perspective focuses on reform history and on opportunities for future reform. We organize our perspective along four main themes: institutional reform, methodological reform, statistical reform and publishing reform. For each theme, we illustrate potential errors by narrating the story of a fictional researcher during the research cycle. We discuss future opportunities for reform. The resulting agenda provides a resource to usher in an era that is marked by a research culture that is less error-prone and a scientific publication landscape with fewer spurious findings